(2 


The Open 
University 


M249 Practical modern 
statistics 


Computer Book 1 


Medical statistics 


About this module 


M249 Practical modern statistics uses the software packages IBM SPSS Statistics (SPSS Inc.) 
and WinBUGS, and other software. This software is provided as part of the module, and its 
use is covered in the Introduction to statistical modelling and in the four computer books 
associated with Books 1 to 4. This computer book contains all the computer work associated 
with Book 1. 


Cover image courtesy of NASA. This photograph, acquired by the ASTER instrument on 
NASA’s Terra satellite, shows an aerial view of a large alluvial fan between the Kunlun and 
Altun mountains in China’s Xinjiang province. For more information, see NASA’s Earth 
Observatory website at http://earthobservatory.nasa.gov. 


This publication forms part of an Open University module. Details of this and other 
Open University modules can be obtained from the Student Registration and Enquiry 
Service, The Open University, PO Box 197, Milton Keynes MK7 6BJ, United Kingdom 
(tel. +44 (0)845 300 60 90; email general-enquiries@open.ac.uk). 


Alternatively, you may visit the Open University website at www.open.ac.uk where 
you can learn more about the wide range of modules and packs offered at all levels by 
The Open University. 


To purchase a selection of Open University materials visit www.ouw.co.uk, or contact 
Open University Worldwide, Walton Hall, Milton Keynes MK7 6AA, United Kingdom 
for a brochure (tel. +44 (0)1908 858779; fax +44 (0)1908 858787; email 
ouw-customer-services@open.ac.uk). 


The Open University, Walton Hall, Milton Keynes MK7 6AA. 
First published 2007. Second edition 2013. 
Copyright © 2007, 2013 The Open University 


All rights reserved. No part of this publication may be reproduced, stored in a 
retrieval system, transmitted or utilised in any form or by any means, electronic, 
mechanical, photocopying, recording or otherwise, without written permission from 
the publisher or a licence from the Copyright Licensing Agency Ltd. Details of such 
licences (for reprographic reproduction) may be obtained from the Copyright 
Licensing Agency Ltd, Saffron House, 6-10 Kirby Street, London ECIN 8TS 
(website www.cla.co.uk). 


Open University materials may also be made available in electronic formats for use 

by students of the University. All rights, including copyright and related rights and 
database rights, in electronic materials and their contents are owned by or licensed 

to The Open University, or otherwise used by The Open University as permitted by 
applicable law. 


In using electronic materials and their contents you agree that your use will be solely 
for the purposes of following an Open University course of study or otherwise as 
licensed by The Open University or its assigns. 

Except as permitted above you undertake not to copy, store in any medium 
(including electronic storage or use in a website), distribute, transmit or retransmit, 
broadcast, modify or show in public such electronic materials in whole or in part 
without the prior written consent of The Open University or in accordance with the 
Copyright, Designs and Patents Act 1988. 


Edited, designed and typeset by The Open University, using the Open University 
TEX System. 
Printed in the United Kingdom by The Charlesworth Group, Wakefield. 


ISBN 978 1 7800 7532 7 
2.1 


Contents 


Introduction 
Chapter 1 Creating tables 


Chapter 2 Analysing data from cohort studies and 
case-control studies 


Chapter 3 Creating stratified tables 
Chapter 4 Stratified analysis of tabular data 
Chapter 5 Dose-response analysis 

Computer Exercises on Book 1 

Learning outcomes 

Solutions to Computer Activities 

Solutions to Computer Exercises 


Index 


Introduction 


This computer book covers all the computer work associated with Book 1 Medical 
statistics of M249 Practical modern statistics. 


Using this book 


As you study Book 1, you will be directed to work through particular chapters in 
this computer book. You are advised not to work on the activities here until you 
have reached the appropriate points in Book 1. 


The activities vary in nature and in length. Some contain instructions on how to 
use the software to perform particular tasks. Others provide practice at using the 
software to explore or analyse data; you will find solutions to these at the end of 
the computer book. 


A few supplementary computer exercises on the whole of Book 1 are provided 
after Chapter 5. You may use these for extra practice, or for revision (or not at 
all — they are optional) as you wish. Solutions to these exercises are given at the 
end of the computer book. 


Conventions used in the computer books 


For clarity of presentation, bold-face type has been used for file names throughout 
M249. The names of menus and items in menus are also printed in bold-face when 
referred to in the text, as are options and the names of fields and buttons in 
dialogue boxes. 


When describing a computer procedure for the first time, full instructions will be 

given. When the procedure is next referred to, a brief reminder will be provided 

in the margin. For example, you might be asked to obtain the Frequencies 

dialogue box. A margin note such as that shown here will indicate that this is Use Analyze > Descriptive 
obtained by choosing Frequencies... from the Descriptive Statistics Statistics > Frequencies. ... 
submenu of Analyze. 


When you are asked to use the mouse to click on an item, you should assume that 
this refers to the left-hand mouse button. If you need the right-hand mouse 
button this will be stated explicitly. 


Data files 


All the data files for this computer book are located in the Book 1 subfolder of 
the M249 Data Files folder in Documents. 


Chapter 1 
Creating tables 


In this chapter, you will learn how to create tables in SPSS. There are two stages 
— entering the data, and processing them. 


Computer Activity 1.1 Entering tabular data in SPSS 


The data in Table 1.1 are from a cohort study to examine the association between 
pre-eclampsia and eclampsia during the first pregnancy and hypertension in later 
life. 


Table 1.1 Gestational pre-eclampsia or eclampsia and hypertension in later life 
Exposure category Hypertension No hypertension Total 
Pre-eclampsia or eclampsia 327 215 542 
No pre-eclampsia or eclampsia 76 201 277 


In this activity you will create an SPSS data file containing these data. 
Run SPSS now. The Data Editor will open. 


In Table 1.1, the disease variable uses two columns and the exposure variable uses 
two rows. In SPSS, tabular data cannot be entered directly in this format. SPSS 
reserves columns for variables, and rows for the units on which the variables are 
measured, so a separate column must be used for each variable. Table 1.2 
contains the same information as Table 1.1, but the data are arranged in the 
format required by SPSS. 


The exposure category has been coded 1 for Pre-eclampsia or eclampsia and 2 for 
No pre-eclampsia or eclampsia. Similarly, the disease outcome has been coded 1 
for Hypertension and 2 for No hypertension. Enter the counts in the first column 
of the Data View data sheet, as follows. 


© Position the mouse pointer in the top left-hand cell and click on the cell to 
activate it, then type 327 in the cell. 


© Move the cursor to the next cell in the column by pressing Enter or the 
‘down’ cursor key (1). 


Notice that the variable name in the column heading changes from var to 
VARO00001. Note also that the number you have just entered changes from 327 
to 327.00. By default, SPSS shows two decimal places. You will learn how to 
change this default later in this activity. 


© Type 215 in the second cell of the first column, and press Enter (or |). 


© Enter the values 76 and 201 in the next two cells. (Do not enter the totals 
from Table 1.1. SPSS will calculate these later.) 


© Now activate the top cell of the second column (by clicking on it), and enter 
the codes for the exposure variable in the first four cells of the column: 1, 1, 
2,2. 


© Enter the codes for the disease variable in the first four cells of the third 
column: 1, 2, 1, 2. 


These data are reproduced from 
Table 1.1 of Book 1. 


If the Data View panel is not 
uppermost, then click on the 
Data View tab in the lower 
left-hand corner of the window. 


Table 1.2 Data in the format 
required for entry in SPSS 


Count Exposure Disease 
327 1 1 
215 1 2 

76 2 1 
201 2 2 


Computer Book 1 


The data are now entered. The Data View panel will be as shown in Figure 1.1. 


“Untitled 
File Edit View Data Transform Analyze Graphs Utilities Add-ons Window Help 


SoS 0 o> Fee ASE RSG les 




































































































































































































































































|5: vaR00003 |Visible: 3 of 3 Variables 
VAR00001 | VAR00002 J| VAR00003 L var var | var | var || var | var | var || var L var | va 
1 327.00 1.00 1.00 Fol 
2 215.00 1.00 2.00 
3 76.00 2.00 1.00 
4 201.00 2.00 2.00 
5 
6 
7 
Ce |] 
== 
== 
11 
=| 
13 
14 
15 
16 
17 
18 
19 
20 = 
cn F 
[IBM SPSS Statistics Processor is ready | (im | | 





Figure 1.1 The data from Table 1.1 entered in SPSS 


Next, enter the variable names, as follows. 


© Click on the Variable View tab near the bottom left-hand corner of the 
window so that the Variable View panel is uppermost. 

© Replace the default variable name VARO0001 by typing count in its place. 

© Similarly, replace VAR00002 by exposure, and VARO0003 by disease. 


The data you have entered are all whole numbers, yet they appear with two 
decimal places. Change the default number of decimal places to zero for each 
variable, as follows. 
© For each of the top three cells in the Decimals column in turn, activate the 
cell, then click twice on the down arrow within the cell. (This changes the 
value to 0.) 
The next stage is to define the coding you have used. This is done in the column 
headed Values. The variable count has no codes associated with it, so leave the 
first cell of the Values column set to None. For the variable exposure, you used 
two codes — 1 corresponding to ‘Pre-eclampsia or eclampsia’ and 2 corresponding 
to ‘No pre-eclampsia or eclampsia’. SPSS allows you to specify the labels for these 
codes, as follows. 


© Activate the second cell in the Values column. A blue box will appear in the 
cell. 


Click on this blue box and the Value Labels dialogue box will open. 
Type 1 in the Value field. 
Type Pre-eclampsia or eclampsia in the Label field. 


To confirm the value label, click on the Add button. The value and the value 
label will appear in the area to the right of the Add button. 


oOo o © 


© 


Now type 2 in the Value field, type No pre-eclampsia or eclampsia in 
the Label field, and click on Add. 


Chapter 1 Creating tables 


Note that if you need to edit a value label — for instance, if you make a mistake 
— then you can do so as follows: select the label in the area to the right of the 
Add button, then edit it in the Label field. 


© Finally, click on OK to close the Value Labels dialogue box. 
Now specify the labels for codes 1 and 2 for the variable disease, as follows. 


© Activate the third cell of the Values column (by clicking on it), and click on 
the blue box that appears. Another Value Labels dialogue box will open. 


© Enter the value label Hypertension for the value 1 and the value label 
No hypertension for the value 2. 


© Click on OK to close the Value Labels dialogue box. 


The final stage is to define the variable types in the Measure column. This is the 


last but one column on the right. You may need to scroll across to 


l , , this column. Alternatively, 
© Click on the first cell in the Measure column (corresponding to the variable a ea ac . a Data 


count). Editor window. 
There are three options in the drop-down list that appears: Scale, Ordinal and 
Nominal. Scale variables comprise numerical data; ordinal and nominal variables 
comprise categorical data. Categorical data are ordinal if the categories are 
ordered (for example, the categories bad, average, good, excellent), and nominal 
if the categories are not ordered (for example, exposed and unexposed, or male 
and female). In this computer book, all counts are coded as scale variables, and 
all binary exposure variables and disease variables are coded as nominal. 


© Click on Scale to select this type for the variable count. 


© Now select Nominal for the variable exposure: click on the second cell in the 
Measure column, and select Nominal from the drop-down list. 


© Similarly, select Nominal for the variable disease. 


Now click on the Data View tab to return to the Data View panel. Note that 
the data now appear as whole numbers. 


If you forget what the codes 1 and 2 for the variables exposure and disease 

represent, you can view the value labels you have entered by clicking on the 

Value Labels button on the toolbar. (This is the button with the road sign on Alternatively, click on Value 
it, fourth from the right. Place the mouse pointer over this button and the Labels within the View menu 
description ‘Value Labels’ will appear.) Click on the Value Labels button now. °° show or hide the labels. 

To read the full value labels, you may need to increase the column widths. For 

example, to increase the width of the exposure column, place the mouse pointer 

on the vertical line separating the variable names exposure and disease (at the 

top of these columns), hold the mouse button down, and drag the mouse to adjust 

the width of the exposure column. 


The final step, which is very important, is to to tell SPSS that the data are 
counts, rather than measurements. A count of 327 represents 327 individuals, not 
one measurement with value 327, and thus is given greater weight than a count of, The Weight Cases dialogue 


say, 215. In SPSS terminology, you need to weight the cases. Do this now, as box can also be opened by 
follows clicking on the scales button on 


the toolbar. 
© Choose Weight Cases... from the Data menu to open the Weight Cases 
dialogue box. 


© Select Weight cases by and enter count in the Frequency Variable field. 
© Click on OK. 


SPSS will now recognize the data as count data. The Viewer window will open, 
and show the command you have just executed. If necessary, make the Data 
Editor window active again (by clicking on it, or on its name or checkbox on the 
Windows menu). 


Computer Book 1 


Finally, save your work in an SPSS data file named hyperl1.sav, as follows. 


© Choose Save As... from the File menu of the Data Editor to open the 
Save Data As dialogue box. 


© To save the file in the Book 1 folder, Book 1 must appear in the Look in 
field. If it does not, then navigate to the Documents folder, double-click on 
M249 Data Files in the main panel, and then double-click on Book 1. 


© Type hyper1 in the File name field. 


© Check that the Save as type field shows SPSS Statistics(*.sav). If it 
does not, click on the down arrow and select SPSS Statistics(*.sav). 


© Click on Save. 


The saved file hyper1.sav is located in the Book 1 subfolder of the M249 Data 
Files folder in Documents. You will need the file in Computer Activity 1.2, so 
do not close it. 


Computer Activity 1.2 Displaying tabular data in SPSS 


In this activity, you will learn how to display the data in the data file hyperl.sav 
as a table. This is done using Crosstabs... from the Descriptive Statistics 
submenu of Analyze, as follows. 


© Choose Crosstabs... from the Descriptive Statistics submenu of 
Analyze to obtain the Crosstabs dialogue box. 


© Enter exposure in the Row(s) field. 


© Enter disease in the Column(s) field. Ignore all the other fields and check 
boxes. 


© Click on OK, and the Viewer window will become active. 


The following table, which is called a crosstabulation in SPSS, will be displayed in 
the Viewer window. 


exposure * disease Crosstabulation 


Count 


| disease sid 
No 
Hypertension hypertension Total 


exposure Pre-eclampsia or 


eclampsia 


No pre-eclampsia 
or eclampsia 





Note that SPSS has calculated row totals and column totals, and these are 
included in the table. Save the output if you wish. Then exit from SPSS. 


Note that the data file 
hypertension.sav also contains 
these data. 


In the Viewer window, use File 
> Save As.... The file 
extension required is spv. 


Chapter 2 Analysing data from cohort studies and case-control studies 


Computer Activity 1.3 Entering and displaying tabular data 
This activity will give you some practice at entering tabular data and displaying 
them in a table. 
Table 1.3 contains data from a case-control study of lung cancer and smoking. 


Table 1.3 Smoking and lung cancer in males 


Exposure category Cases of lung cancer Controls 
Smoked 647 622 
Never smoked 2 27 
Total 649 649 


If necessary, obtain a blank data sheet in the Data Editor by choosing Data 
from the New submenu of File. Enter these data in SPSS, using the variable 
names count, exposure and disease, and the value labels shown in Table 1.3. 
Save the data in a file using the file name smoking1.sav. Obtain a 
crosstabulation of the data. 


Summary of Chapter 1 


In this chapter, you have learned how to enter data in tabular form in SPSS, how 
to save the data in a file, and how to display the data in a table. 


Chapter 2 


These data are reproduced from 
Table 3.3 of Book 1. 


Analysing data from cohort studies and 


case-control studies 


In this chapter, you will learn how to use SPSS to analyse data from cohort 
studies and case-control studies. 


Computer Activity 2.1 Odds ratios and relative risks for cohort 
studies 


Data from a cohort study on serious self-inflicted injury (SSII) and compulsory 
redundancy among meat-processing workers in New Zealand are in the SPSS data 
file redundancy.sav. In Section 1 of Book 1, you calculated estimates of the 
relative risk RR and the odds ratio OR. In this activity, you will use SPSS to 
obtain the odds ratio and the relative risk, together with 95% confidence intervals. 
This is done using Crosstabs... from the Descriptive Statistics submenu of 
Analyze. 


See Activity 1.5 in Book 1. 


Computer Book 1 


It is a good idea to begin by checking that the data are correctly entered by 
displaying the data table. Do this now, as follows. 


© Open the data file redundancy.sav. 
© Obtain the Crosstabs dialogue box. 


© Enter exposure as the row variable and disease as the column variable, as 
described in Computer Activity 1.2. 


© Click on OK. 
The following table will be displayed in the Viewer window. 


exposure * disease Crosstabulation 


Count 


Selfinflicted No self- 
injury inflicted injury 





exposure Made redundant 14 1931 
Not made redundant 4 1763 
Total 18 3694 


Now find the odds ratio, relative risk and 95% confidence intervals, as follows. 


© Obtain the Crosstabs dialogue box. Note that the entries you have just 
made are still there. 


© Click on Statistics... near the top right of the dialogue box to open the 
Crosstabs: Statistics dialogue box. 


© Check the Risk box. Leave all the other boxes unchecked. 
© Click on Continue to return to the Crosstabs dialogue box. 
© Click on OK. 


The following table will be displayed in the Viewer window, below the 
crosstabulated data. 


Risk Estimate 


95% Confidence Interval 
Value 


Odds Ratio for exposure 
(Made redundant/ Not 
made redundant) 


For cohort disease = Self- 
inflicted injury 


For cohort disease = No 
self-inflicted injury 


N of Valid Cases 





The odds ratio and its confidence limits are in the first row of the table: 
OR = 3.195, with 95% confidence interval (1.050, 9.726). 


The relative risk and its confidence limits are in the second row: RR = 3.180, 
with 95% confidence interval (1.049, 9.642). 


The value in the third row is the relative risk for avoidance of serious self-inflicted 
injury; you should ignore this. 


10 


Use Analyze > Descriptive 
Statistics > Crosstabs.... 


Use Analyze > Descriptive 
Statistics > Crosstabs.... 


Chapter 2 Analysing data from cohort studies and case-control studies 


Computer Activity 2.2 Odds ratios for case-control studies in SPSS 


In SPSS, odds ratios and 95% confidence intervals are found using Crosstabs... 
in exactly the same way for case-control studies as for cohort studies. Data from a 
case-control study of the association between alcohol consumption and fatal car 
accidents are in the SPSS data file drinkdriving.sav. Open this file now. 


Follow the method described in Computer Activity 2.1 to obtain output including 
the odds ratio. Note that you must enter outcome in the Column(s) field of the 
Crosstabs dialogue box (instead of disease). The output will include the 
following table. 


Risk Estimate 


95% Confidence Interval 
Value 


Odds Ratio for exposure 
(Blood alcohol >= 
100mg% / Blood alcohol 
< 100mg%) 


For cohort outcome = 
Cases: killed in car 
accident 

For cohort outcome = 
Controls 


N of Valid Cases 





The estimated odds ratio and its 95% confidence limits are in the first row of the 
table. These correspond to the values obtained in Example 3.3 of Book 1. 


Notice that SPSS has also calculated ‘relative risks’. These are given in the next 
two rows. SPSS has no way of recognizing that this is a case-control study, and 
hence that the relative risk cannot be estimated. The ‘relative risk’ estimates 
(9.927 and 0.389) and their confidence intervals are meaningless. 


You should ignore rows 2 and 3 of a Risk Estimate table when analysing 
case-control data. These rows are only meaningful with cohort data. 


This example illustrates the danger of using the output produced by a statistical 
package such as SPSS uncritically. 


Computer Activity 2.3 Testing for no association in SPSS 


In SPSS, the chi-squared test for no association is performed using Crosstabs.... 


In this activity, you will use SPSS to carry out a chi-squared test for no 
association between redundancy and serious self-inflicted injury among 
meat-processing workers in New Zealand. The data file redundancy.sav should 
still be open. (If not, open it now.) If necessary, make it active by clicking on the 
Data Editor displaying it (or use the Window menu). 


First, obtain expected frequencies in SPSS, as follows. 


© Obtain the Crosstabs dialogue box, and check that exposure is in the 
Row(s) field and disease in the Column(s) field. (If not, then enter them.) 


© Click on Cells... near the top right of the Crosstabs dialogue box to open 
the Crosstabs: Cell Display dialogue box. 


© The Observed check box should be checked. (If it is not checked, then check 
it.) Check the Expected check box below it. 


© Click on Continue, then on OK. 


These data are discussed in 
Example 3.3 of Book 1. 


In Book 1 the results were 
rounded to two decimal places. 


Use Analyze > Descriptive 
Statistics > Crosstabs.... 


ilil 


Computer Book 1 


The following table will be displayed in the the Viewer window. 


exposure * disease Crosstabulation 


Self-inflicted No self- - 
injury inflicted injury Total 
exposure Made redundant Count 14 1931 1945 
Expected Count 94 1935.6 


Not made redundant Count 
Expected Count 





Count 
Expected Count 








This table shows the expected frequencies as well as the observed frequencies. So, 
for instance, you can see that more instances of serious self-inflicted injury were 
observed than expected among workers who were made redundant — 14 
compared to 9.4. 


Now test the hypothesis of no association between serious self-inflicted injury and 
redundancy, as follows. 


© Obtain the Crosstabs dialogue box. (All the settings you chose earlier in 
this activity will be preserved.) 


© Click on Statistics... to open the Crosstabs: Statistics dialogue box. 


© Check the Chi-square check box (in the top left-hand corner of the dialogue 
box). 


© Click on Continue, then on OK. 
The following table will be displayed in the Viewer window. 


Chi-Square Tests 
Asymp. Sig. Exact Sig. (2- Exact Sig. (1- 
Value df (2-sided) sided) sided) 
Pearson Chi-Square 1 031 
Continuity Correction” 


Likelihood Ratio 


Fisher's Exact Test 


Linear-by-Linear 
Association 


N of Valid Cases 
a. 0 cells (0.0%) have expected countless than 5. The minimum expected count is 8.57. 
b. Computed only for a 2x2 table 








The row labelled Pearson Chi-Square contains details of the chi-squared test for 
no association described in Section 4 of Book 1. The value of the chi-squared test 
statistic is 4.671. Note a below the table confirms that no cells have an expected 
frequency less than 5. The approximation upon which the chi-squared test relies 
is therefore valid. 


To the right of the chi-squared test statistic, in the column headed df, are the 
degrees of freedom. Since this is a 2 x 2 table, there is (2 — 1) x (2-1) =1 
degree of freedom. The next column is headed Asymp. Sig. (2-sided). This 
column contains the significance probability, or p value, for the test of the null 
hypothesis of no association. (Asymp. Sig. is short for ‘asymptotic significance’. 
The term ‘asymptotic’ simply means that the p value is valid provided that the 
expected values in each cell are sufficiently large.) 


The p value is 0.031. There is moderate evidence of an association between 
redundancy and serious self-inflicted injury. 


The rows labelled Continuity Correction and Likelihood Ratio refer to two 
tests of association that are not covered in M249. The row labelled 
Linear-by-Linear Association relates to a test that will be discussed in 
Chapter 5. 


l2 


For the approximation upon 
which the chi-squared test relies 
to be valid, all the expected 
frequencies must be at least 5. 


Chapter 2 Analysing data from cohort studies and case-control studies 


The row labelled Fisher’s Exact Test contains results for the exact test for no 
association that was discussed in Subsection 4.3 of Book 1. This test is valid even 
when the table contains cells with expected frequencies less than 5. If you wish to 
use Fisher’s exact test, you should use the p value in the column headed Exact 
Sig. (2-sided). In this example, the chi-squared test and Fisher’s exact test give 
similar answers. You could use either test, but you should specify which test you 
have used when reporting the results. 


Computer Activity 2.4 Fisher’s exact test 


For the data analysed in Computer Activity 2.3, Fisher’s exact test gives similar 
results to the chi-squared test. In this activity you will analyse data for which the 
two tests do not give similar answers. 


Data on the proportions of measles cases who died before and after a mass 
vaccination campaign are in the SPSS data file measlesdeath.sav. Open this file After opening this file, to avoid 
now. Use the method described in Computer Activity 2.3 to obtain the following cluttering your computer screen, 


crosstabulation, which shows both the observed and the expected frequencies. you can close any Data Editor 
windows from previous activities 
exposure * disease Crosstabulation that are still open. 


| disease o| 
Hospitalized Hospitalized 
for measles for measles 
and died and — = 
exposure After campaign Count 


Expected Count 


Before campaign Count 
Expected Count 


|| l 
Count 742 
Expected Count 742.0 7 
The expected number of deaths after the campaign is 0.9. The following message 
will also appear at the bottom of the Viewer window. 





>Warning # 3211 

>On at least one case, the value of the weight variable was zero, negative, or 
>missing. Such cases are invisible to statistical procedures and graphs which 
>need positively weighted cases, but remain on the file and are processed by 
>non-statistical facilities such as LIST and SAVE. 


This warning alerts you to the fact that there is a zero count in the data table. 
You may ignore it. Now carry out the test for no association using the procedure 
described in Computer Activity 2.3. 


The following table will be displayed in the Viewer window. 


Chi-Square Tests 


Asymp. Sig. Exact Sig. (2- Exact Sig. (1- 
Value (2- ni sided) sided) 
Pearson Chi-Square 
Continuity Correction? 


Likelihood Ratio 
Fisher's Exact Test 


Linear-by-Linear 
Association 


N of Valid Cases 
a. 1 cells (25.0%) have expected countless than 5. The minimum expected count is .87. 
b. Computed only for a 2x2 table 





The value of the chi-squared test statistic is 0.934, but note a warns that one cell 
has an expected count less than 5. It follows that the p value of 0.334 quoted in 
the table may not be reliable. 


13 


Computer Book 1 


You can see that this is indeed the case by looking at the p value for Fisher’s 
exact test which is given in the column headed Exact Sig. (2-sided). The p 
value is 1.000. (This implies that no other table with the same marginal totals is 
more likely to have occurred than this one.) This p value differs substantially 
from the p value obtained for the chi-squared test, though both lead to the same 
conclusion: there is little evidence of an association. 


So far, you have used SPSS to carry out a test for no association only for 2 x 2 
tables. In Computer Activity 2.5 you will use SPSS to carry out a test for no 
association in a larger table. As you will see, in this case the SPSS output is 
slightly different. 


Computer Activity 2.5 Larger tables 


Data on gestational age and hospitalization for asthma in childhood are in the These data are discussed in 
SPSS data file asthmagest.sav. Open this file now. There are three variables: Example 4.1 of Book 1. 
count, exposure and disease. The variable exposure has three categories: 

Pre-term, Term and Post-term. In the Data Editor, bring the Variable View 

window uppermost and look at the column headed Measure. Note that the entry 

for exposure is Ordinal, because the three categories can be arranged in order of 

increasing gestational age. (The analysis of the data is not affected by this.) 


Obtain the following crosstabulation. Use the method described in 
Computer Activity 1.2 for a 


exposure * disease Crosstabulation 2 x 2 table. 


Count 
| disease |] 


Not 
Hospitalized hospitalized 
for asthma for asthma 


exposure Pre-term 18 284 
Term 2 8967 
Post-term 1145 
Total 10396 





Now carry out a test for no association. The method is the same as that described 


in Computer Activity 2.3 for a 2 x 2 table. Remember to check the 
2 . : . p . Chi-square check box in the 
The following table will be displayed in the Viewer window. Crosstabs: Statistics dialogue 
box. 


Chi-Square Tests 


Asymp. Sig. 
(2-sided) 
Pearson Chi-Square 
Likelihood Ratio 


Linear-by-Linear 
Association 
N of Valid Cases 





a. 0 cells (.0%) have expected countless than 5. The 
minimum expected countis 12.70. 


The first row contains the value of the chi-squared test statistic: 3.104. Since this You may ignore the other rows 
is a 3 x 2 table, there are (3 — 1) x (2 — 1) = 2 degrees of freedom, as indicated in in the table. 

the df column. The p value is 0.212. So there is little evidence of an association 

between gestational age and hospitalization for childhood asthma. 


14 


Chapter 2 Analysing data from cohort studies and case-control studies 


Note that the output does not include the results of Fisher’s exact test. Although 
Fisher’s exact test is not needed in this example, it would be needed if the 
expected value for any of the cells was less than 5. Fisher’s exact test is not given 
as a default because the calculations involved can be time-consuming for tables 
with more than two rows or more than two columns. 

Now carry out Fisher’s exact test for this 3 x 2 table, as follows. 

© Obtain the Crosstabs dialogue box. 

© Click on the Exact... button (at the top right of the dialogue box). 


The Exact Tests dialogue box will open. There are three radio buttons: 
Asymptotic Only (for when only large-sample chi-squared calculations are 
required), Monte Carlo (which you should ignore), and Exact. 


© Select Exact (by clicking on it or on its radio button). 


Since the computations can be time-consuming for large tables, you can select a 
time limit: the default is set at five minutes. Leave this unchanged. 


© Click on Continue, and then on OK. 
The following output will be displayed in the Viewer window. 


Chi-Square Tests 


Asymp. Sig. Exact Sig. (2- Exact Sig. (1- Point 
Value df (2-sided) sided) sided) Probability 


Pearson Chi-Square 
Likelihood Ratio 


Fisher's Exact Test 


Linear-by-Linear 
Association 


N of Valid Cases 
a. 0 cells (0.0%) have expected countless than 5. The minimum expected countis 12.70. 
b. The standardized statistic is 1.511. 














Look in the row labelled Fisher’s Exact Test. SPSS gives the exact significance 
probability as 0.207. So there is little evidence of an association between 
gestational age and hospitalization for childhood asthma. Note that, in this 
example, Fisher’s exact test and the chi-squared test give similar results. The 
number 3.102 given in the Value column relates only indirectly to Fisher’s exact 
test, and you should ignore it. 


Summary of Chapter 2 


In this chapter, you have learned how to use SPSS to do the following: estimate 
relative risks (in cohort studies) and odds ratios (in cohort studies and 
case-control studies); calculate expected frequencies; perform the chi-squared test 
for no association and Fisher’s exact test. You have also learned how to analyse 
data in larger tables. 


15 


Chapter 3 
Creating stratified tables 


In this chapter, you will learn how to create stratified tables in SPSS. The 
procedure is similar to that described in Chapter 1 for creating unstratified tables. 


Computer Activity 3.1 Entering stratified tabular data in SPSS 


Table 3.1 contains data on the numbers of successes and failures of two treatments 
for kidney stones, stratified by stone size: small (< 2cm diameter) or large 
(> 2cm). 


Table 3.1 Outcome of kidney stone treatment by stone size 


Stone size Success Failure Total 

Small (< 2cm) Keyhole surgery 234 36 270 
Open surgery 81 6 87 

Large (> 2cm) Keyhole surgery 55 25 80 
Open surgery 192 71 263 


In Computer Activity 1.1 you saw that, in order to enter tabular data in SPSS, 
the data must be rearranged into columns. This is also the case for stratified data. 
The data from Table 3.1 have been written in the required format in Table 3.2. 


Table 3.2 Data rearranged for entry in SPSS 
Count Exposure Outcome Stratum 


234 1 1 
36 


or 
ot 
NNrFPrP MN PH 
NrFPNrFP NMR NH 
NONNNRR 


The exposure variable (in this case, surgery type) has been coded 1 for Keyhole 
surgery and 2 for Open surgery. The outcome (Success or Failure) has been 
coded 1 for Success and 2 for Failure. The last column contains values of the 
stratifying variable (stone size), coded 1 for Small stones and 2 for Large stones. 
Enter these data in SPSS, as follows. 


© Inthe Data Editor, make sure the Data View panel is uppermost. (If 
necessary, click on the Data View tab.) 


© Enter the counts 234, 36, ..., 71 from Table 3.2 in the first column of the 
SPSS data sheet. 


© Enter the codes for the exposure variable in the second column. 
© Enter the codes for the outcome variable in the third column. 


© Enter the stratum codes in the fourth column. 


16 


These data are reproduced from 
Tables 6.5 and 6.6 of Book 1. 


Chapter 3 Creating stratified tables 


Now define the variables. First, enter the variable names and labels, as follows. 


© Click on the Variable View tab to bring the Variable View panel 
uppermost. 


© Change the default variable names: replace VARO0001 by count, VARO0002 by 
exposure, VARO0003 by outcome, and VAR00004 by stratum. 


© In the Decimals column, change the default values to 0. 


Now define the codes by assigning value labels in the column headed Values, as 
follows. 


© The variable count was not assigned any codes, so leave this one set to None. 


© For the variable exposure, enter the value label Keyhole surgery for 1 and 
the value label Open surgery for 2. 


© For the variable outcome, enter the value label Success for 1 and the value 
label Failure for 2. 


© Finally, for the variable stratum, enter the value label Small stones for 1 
and the value label Large stones for 2. 


Next define the variable types in the column headed Measure. The variable count 
is a scale variable, the other variables are nominal. This completes the variable 
definition. 


Now return to the Data View panel. You can display the value labels you have 
just entered by clicking on Value Labels in the toolbar, or by clicking on Value 
Labels within the View menu. 


As for unstratified tables, you must weight the cases, so that SPSS recognizes the 
data as counts. Do this now, as follows. 


© Obtain the Weight Cases dialogue box. 
© Select Weight cases by, and enter count in the Frequency Variable field. 
© Click on OK. 


SPSS will now recognize the data as count data. Save your work as an SPSS data 
file in the Book 1 folder using the file name stonesl.sav. You will need the file 
in Computer Activity 3.2, so do not close it. 


Computer Activity 3.2 Displaying stratified tables in SPSS 


Display the data you entered in Computer Activity 3.1 and saved in the file 
stones1.sav in a table, as follows. 


© Obtain the Crosstabs dialogue box. 
© Enter exposure in the Row(s) field and outcome in the Column(s) field. 
© Enter stratum in the Layer 1 of 1 field. Ignore the Previous and Next 


buttons. (These are used for defining further levels of stratification, which are 


not considered in M249.) 
© Click on OK. 
The following table will be displayed in the Viewer window. 
exposure * outcome * stratum Crosstabulation 


Count 


outcome 
stratum Success Failure Total 


Smallstones exposure Keyhole surgery 
Open surgery 


Large stones exposure Keyhole surgery 
Open surgery 


Total 
exposure Keyhole surgery 
Open surgery 











See Computer Activity 1.1 for 
detailed instructions. 


Use Data > Weight Cases.... 


Use File > Save As.... The 
data are also in the data file 
stones.sav. 


17 


Computer Book 1 


Computer Activity 3.3 Water fluoridation and dental caries 
This activity will give you some practice at entering stratified data in SPSS and 
displaying them in a table. 


Data on water fluoridation and dental caries in children aged 8-12 years are 
shown in Table 3.3. 


Table 3.3 Water fluoridation and dental caries 


Age 8 years 

Water type With caries Without caries Total 
Fluoridated 5 25 30 
Not fluoridated 8 23 31 
Age 9 years 

Water type With caries Without caries Total 
Fluoridated 0 17 17 


Not fluoridated 17 33 50 


Age 10 years 


Water type With caries Without caries Total 
Fluoridated 5 13 18 


Not fluoridated 24 14 38 


Age 11-12 years 


Water type With caries Without caries Total 
Fluoridated 5 16 21 


Not fluoridated 29 25 54 


(a) Set out the data in Table 3.3 in the format required for entry in SPSS. Note 
that the stratum variable has four levels, which you should code 1 for ‘Age 
8 years’, 2 for ‘Age 9 years’, 3 for ‘Age 10 years’, and 4 for ‘Age 11-12 years’. 
(b) Obtain a blank data sheet in the Data Editor by choosing Data from the 
New submenu of File. 


Create an SPSS data file containing the data. Use the variable names count, 
exposure, outcome and stratum, and the value labels shown in Table 3.3. To 
do this, follow the procedure described in Computer Activity 3.1, with the 
following difference. The stratum variable in this case is ordinal, since the age 
categories can be placed in order of increasing age. Thus you should select 
Ordinal for this variable in the Measure column of Variable View. 


(c) Save the data in a file named caries1.sav. 


(d) Obtain a crosstabulation of the data. 


Summary of Chapter 3 


In this chapter, you have learned how to enter stratified data in SPSS and display 
them in a table. 


18 


These data are reproduced from 
Table 7.4 of Book 1. 


The data file caries.sav also 
contains these data. 


Chapter 4 
Stratified analysis of tabular data 


In this chapter you will learn how to use SPSS to obtain the Mantel—Haenszel 
estimate of the odds ratio and a 95% confidence interval for the underlying odds 
ratio OR. SPSS will also be used to test the null hypothesis OR = 1 using the 
Mantel-Haenszel chi-squared test, and to test the null hypothesis of homogeneity 
using Tarone’s test. Finally, you will learn how to carry out McNemar’s test for 
1-1 matched case-control studies. 


This may seem like a rather long list. However, as you will see, SPSS produces 


much of this output by default. Producing the output is a simple matter in SPSS; 


but skill is required to interpret it correctly. 


Computer Activity 4.1 The Mantel-Haenszel odds ratios and 
confidence intervals 


In this activity, you will use SPSS to analyse data from a cohort study of 
mortality in persons with diabetes. The exposure group included individuals with 
non-insulin dependent diabetes, and the control group included individuals with 
insulin-dependent diabetes. The outcome was death before the end of the study. 
The data were stratified by age in two groups: those aged 40 years or younger, 
and those aged over 40 years. The data are in the SPSS data file diabetes.sav. 
Open this file now. 


When analysing tabular data, it is good practice to begin by producing a 
crosstabulation of the data. So use Crosstabs... to obtain the following table. 


exposure * outcome * stratum Crosstabulation 


Count 


ee cee 
stratum Died Alive Total 

Non-insulin dependent 15 15 
Insulin dependent 


8 





Diabetics aged 40y or 
younger 


exposure 


Diabetics aged over 40y exposure Non-insulin dependent 


Insulin dependent 





Total 
exposure Non-insulin dependent 


Insulin dependent 





Next, obtain the stratum-specific odds ratios, as follows. 


© Obtain the Crosstabs dialogue box. The settings you chose for the 
crosstabulation will have been retained. 


© Click on Statistics... to open the Crosstabs: Statistics dialogue box. 
© Check the Risk check box. 
© Click on Continue, then click on OK. 


You calculated stratum-specific 
odds ratios for these data in 
Activity 6.4 of Book 1. 


The method is described in 
Computer Activity 3.2. Ignore 
the warning about the zero cell. 


19 


Computer Book 1 


The following table will be displayed in the Viewer window. 


Risk Estimate 


95% Confidence Interval 
stratum Value Lower Upper 
Diabetics aged 40y or For cohort outcome = 
younger Alive 1.008 .993 1.023 

N of Valid Cases 145 
.B36 611 





Diabetics aged over 40y Odds Ratio for exposure 
(Non-insulin dependent/ 
Insulin dependent) 


For cohort outcome = 
Died 


For cohort outcome = 
Alive 


N of Valid Cases 

Odds Ratio for exposure 
(Non-insulin dependent/ 
Insulin dependent) 

For cohort outcome = 
Died 

For cohort outcome = 
Alive 


N of Valid Cases 





For diabetics aged 40 years or younger, SPSS has calculated the relative risk of 
remaining alive, namely 1.008, with 95% confidence interval (0.993, 1.023). SPSS 
has not calculated the odds ratio or its confidence interval because the odds ratio 
is zero — (0 x 129)/(15 x 1) = 0 — and the method for calculating the confidence 
interval does not work in this case. 


For diabetics aged over 40, the odds ratio is 0.836, with 95% confidence interval 
(0.611, 1.143). You should ignore the next row of the table, labelled For cohort 
outcome = Died. The relative risk of remaining alive is labelled For cohort 
outcome = Alive and is 1.081, with 95% confidence interval (0.941, 1.242). The 
lower panel of the table, labelled Total contains the unadjusted odds ratio and 
relative risk from the aggregated data. 


Now undertake a stratified analysis, as follows. 

© Obtain the Crosstabs dialogue box. 

© Click on Statistics... to open the Crosstabs: Statistics dialogue box. 
© Uncheck the Risk check box (by clicking on it). 
° 


Check Cochran’s and Mantel—Haenszel statistics. Leave all the other 
check boxes unchecked, and leave 1 in the Test common odds ratio 
equals field. 


© Click on Continue, then click on OK. 
SPSS produces rather a lot of output in the Viewer window. After the 
crosstabulation of the data you will find the following table. 


Tests of Homogeneity of the Odds Ratio 


Asymp. Sig. 
Chi-Squared df (2-sided) 





Breslow-Day .097 1 
Tarone’s .097 1 


The bottom row contains the results of Tarone’s test for homogeneity. The null 
distribution of the test statistic, under the null hypothesis that the odds ratio is 
the same in all k strata, is approximately .7(k — 1), where k is the number of 
strata. Here k = 2, so there is 1 degree of freedom, as indicated in the column 
headed df. The value of the test statistic is 0.097, and the p value for the test 

is 0.755. Hence there is little evidence that the odds ratios differ between the two 
strata. (You should ignore the Breslow—Day test, which is not covered in M249. 
In some cases it will give slightly different results, though in this case the two 
tests agree.) 


20 


Tarone’s test is discussed in 
Subsection 7.3 of Book 1. 


Chapter 4 Stratified analysis of tabular data 


Scroll down the output in the Viewer window until you find the following table. 


Tests of Conditional Independence 


Asymp. Sig. 
Chi-Squared df (2-sided) 





Cochran's 1.299 1 254 
Mantel-Haenszel 1.121 1 .290 


This table gives the results of two significance tests of the null hypothesis of no 
association between diabetes type and mortality, allowing for the effect of age. 
You need consider only the Mantel-Haenszel test. The null distribution of the test 
statistic, under the null hypothesis of no association, is approximately y?(1). For 
these data, the value of the test statistic is 1.121, and the corresponding p value 
is 0.290. So there is little evidence of association between diabetes type and 
mortality in diabetics. (Ignore the text below the box, which says that the 
Mantel-Haenszel test is more widely applicable than the other test quoted.) 


Scroll further down the output in the Viewer window. The final table is as 
shown below. 


Mantel-Haenszel Common Odds Ratio Estimate 


Estimate 

In(Estimate) 

Std. Error of In(Estimate) 
Asymp. Sig. (2-sided) 


Asymp. 95% Confidence Common Odds Ratio Lower Bound 


Interval Upper Bound 


In(Common Odds Ratio) Lower Bound 
Upper Bound 





The estimate in the first row of the table is the Mantel-Haenszel odds ratio for 
the association between diabetes type and mortality, adjusted for the effect of age. 
Thus the estimate of the common odds ratio is OR = 0.834. Lower down the 
table, in the two rows labelled Asymp. 95% Confidence Interval, are the lower 
and upper 95% confidence limits for the common odds ratio. The 95% confidence 
interval is (0.610, 1.140). You should ignore everything else in the table, as well as 
the text below the table. 


The final stage in the analysis is to summarize the results. In such summaries, it 
is often appropriate to round the results produced by SPSS. While there are no 
universal rules for rounding, it is common to round p values to two significant 
figures, and to round relative risks and odds ratios to three significant figures (but 
seldom to quote more than two decimal places). Thus the results may be 
summarized as follows. 


Overall, there is little evidence of association between type of diabetes 
(insulin-dependent and non-insulin dependent) and mortality in persons with 
diabetes (Mantel-Haenszel test, p = 0.29). The odds ratio, adjusted for age, is 
0.83, with 95% confidence interval (0.61, 1.14). The stratum-specific odds ratios 
for patients aged 40 years or younger and for patients aged over 40 are both less 
than 1. There is little evidence of an interaction with age (Tarone’s test, p = 0.76). 


The Mantel—Haenszel 
chi-squared test is mentioned at 
the end of Subsection 7.1 of 
Book 1. 


See Subsection 7.1 of Book 1. 


2il 


Computer Book 1 


In Computer Activity 4.1, the stratified analysis was described in stages. In 
practice, it is easier to check both the Risk check box and the Cochran’s and 
Mantel-Haenszel statistics check box in the Crosstabs: Statistics dialogue 
box. SPSS will then produce, in order, the following output. 


A case processing summary 

A crosstabulation of the data 

The stratum-specific and unadjusted odds ratios with confidence intervals 
Tarone’s test for homogeneity of the odds ratio 


The Mantel—Haenszel test for no association 


oOo © ooog 


The Mantel-Haenszel odds ratio and confidence interval 


Computer Activity 4.2 Drink-driving and marital status 


Data from a case-control study on drink-driving are discussed in Book 1. Cases 
are individuals who were killed in car accidents for which they were responsible. 
Controls are drivers sampled at the locations where the accidents occurred, at the 
same time of day, and on the same day of the week. Individuals were considered 
exposed if their blood alcohol level was 100 mg% or more. The data are stratified 
by marital status. 


The data are stored in the SPSS data file drinkdriving2.sav. Undertake the 
following analyses. 


(a) Obtain stratum-specific odds ratios for the association between blood alcohol 
level and driver fatalities in married and unmarried individuals, with 95% 
confidence intervals. Also obtain the unadjusted odds ratio. 


(b) Use Tarone’s test for homogeneity to investigate whether there is any 
evidence of an interaction between marital status and alcohol level. 


(c) Obtain the Mantel-Haenszel estimate of the odds ratio for the association 
between blood alcohol level and driver fatalities, adjusted for marital status. 
Also obtain a 95% confidence interval for the odds ratio. 


(d) Test the null hypothesis of no association between blood alcohol level and 
driver fatalities using the Mantel-Haenszel test. 


(e) Summarize your findings. 


In Subsection 7.2 of Book 1, you learned how to estimate the odds ratio and 
calculate a 95% confidence interval for the odds ratio for a 1-1 matched 
case-control study, and also how to carry out McNemar’s test. SPSS calculates 
the exact p value for the test. However, you must calculate the odds ratio and the 
confidence interval by hand. Instructions for carrying out McNemar’s test in 
SPSS are given in Computer Activity 4.3. 


Computer Activity 4.3 McNemar’s test in SPSS 


Data from a 1—1 matched case-control study on contraceptive pill use and blood 
clots are shown in Table 4.1. 
Table 4.1 Contraceptive pill use and hospital admissions for blood clots 
Controls 
Exposed Not exposed 


c Exposed 10 57 
ases 
Not exposed 13 95 
To analyse data from a 1-1 matched case-control study using McNemar’s test, the 


data must be entered in SPSS in the correct format. The data entry format 
required for the data in Table 4.1 is as shown in Table 4.2. 


Pag 


The data are introduced in 
Example 3.3 of Book 1. The 
effect of marital status is 
discussed in Examples 7.1 
and 7.2. 


These data are reproduced from 
Table 7.7 of Book 1. The study 
is described in Example 7.4. 


Chapter 4 Stratified analysis of tabular data 


The entries in the Cases and Controls columns of Table 4.2 are coded 1 for 
Exposed, 2 for Not exposed. Note that this data entry format is different from the 
formats used so far in this computer book, which are appropriate for unmatched 
data. Using the wrong data format will result in meaningless output being 
produced. Unfortunately, SPSS cannot recognize this, and will generate output 
whether or not the correct format has been used. So, be careful! 


The data are saved in the correct format for analysis using McNemar’s test in the 
data file pill.sav. Open this file now and look at the Data View panel. Notice 
that the two variables cases and controls are set out as in Table 4.2. 


Now analyse these data using McNemar’s test, as follows. 
© Obtain the Crosstabs dialogue box. 

© Enter cases in the Row(s) field and controls in the Column(s) field. 
© Click on Statistics... 
> 


Check the McNemar check box. (Do not check any other boxes: none of 
them will produce meaningful output.) 


© Click on Continue, then click on OK. 


The following crosstabulation will be displayed in the Viewer window. 


to open the Crosstabs: Statistics dialogue box. 


cases * controls Crosstabulation 


Count 


controls 
Exposed Not exposed 
10 57 


Total 
Exposed 57 
Not exposed 


cases 


Total 





The next table in the Viewer window, which gives the results of McNemar’s test, 
is as shown below. 


Chi-Square Tests 


Exact Sig. (2- 
Value sided) 


McNemar Test 000" 
N of Valid Cases 175 


a. Binomial distribution used. 





McNemar’s test is a significance test of the null hypothesis of no association. The 
exact p value for the test is given in the first row of the table. This is quoted as 
0.000, which means that p < 0.0005. Thus there is strong evidence of an 
association between oral contraceptive use and blood clots. 


The row labelled N of Valid Cases gives the number of matched pairs in the 
table (that is, the sum of the four entries in Table 4.1). 


SPSS does not provide any other output relating to the analysis of 1-1 matched 
case-control studies. Other quantities must be calculated by hand, using the 
entries in the output table entitled cases*controls Crosstabulation (or using 
Table 4.1). For example, the Mantel-Haenszel odds ratio, adjusted for the 
matching variables, is ORwa = 57/13 ~ 4.38. Thus the association between oral 
contraceptive use and blood clots is positive. 


Table 4.2 Data entry format 
for a 1-1 matched case-control 
study 


Count Cases Controls 
10 1 1 
57 1 2 
13 2 1 
95 2 2 


If the value labels are displayed, 
click on the Value Labels 
button (the one with the road 
sign) on the taskbar. 


The analysis of 1-1 matched 
case-control data is discussed in 
Subsection 7.2 of Book 1. 


Zs 


Computer Book 1 


Computer Activity 4.4 Educational level and Alzheimer’s disease 


Alzheimer’s disease is a form of dementia affecting mainly (but not exclusively) 

elderly people. It is currently incurable and its causes are not well understood. 

Several studies have suggested that the risk of Alzheimer’s disease is higher 

among persons with lower educational level, though the relationship appears to be 

a complex one. A case-control study was undertaken in Sweden to investigate the Gatz, M., Svedberg, P., 


hypothesis that educational level and risk of Alzheimer’s are associated. Pedersen, N.L., Mortimer, J.A., 
Berg, S. and Johansson, B. 
The study was conducted in pairs of twins of the same sex, in which one twin (2001) Education and the risk of 


developed Alzheimer’s and the other did not. In each pair of twins, the case was Alzheimer’s disease: findings 
the twin with Alzheimer’s and the control was the twin without Alzheimer’s. from the study of dementia in 
Sach: a Shady dest aaa ibl buünders. includi ti Swedish twins. Journal of 

uch a study design matches for numerous possible confounders, including genetic Gerontology, 56B, 292-300. 
and environmental factors, that may affect both educational achievement and risk 
of Alzheimer’s. The study included mainly elderly twins, born in the early part of 


the twentieth century. 


In this study, an individual’s educational level was categorized as Low if he or she 
had received six or fewer years of education, and High if he or she had received 
more than six years of education. (In the early twentieth century, relatively few 
Swedes received more than six years of education.) The exposure is defined as low 
educational level. 


The data are shown in Table 4.3. 


Table 4.3 Educational level (Low or High) and Alzheimer’s disease in Swedish 
same-sex twins 


Controls 
Low High 
Case Low 73 8 
ases ë High | 3 6 


(a) Without using your computer, obtain the Mantel-Haenszel odds ratio for the 
association between educational level and Alzheimer’s disease. 


(b) The data are in the SPSS file Alzheimers.sav. Open the file and check that 
the correct data entry format has been used. 


(c) Test the null hypothesis that there is no association between educational level 
and risk of Alzheimer’s disease. What do you conclude? 


Summary of Chapter 4 


In this chapter, you have used SPSS to analyse stratified data. You have learned 
how to obtain the Mantel-Haenszel estimate of the odds ratio and a 95% 
confidence interval for the odds ratio, how to test the null hypothesis of no 
association for stratified data, and how to test for homogeneity of the odds ratio 
using Tarone’s test. You have also learned how to carry out McNemar’s test for 
1-1 matched case-control studies. 


24 


Chapter 5 
Dose-response analysis 


The chi-squared test for no linear trend is described in Section 8 of Book 1. SPSS 
features a test which is more generally applicable — the test for linear-by-linear 
association. In tables with ordered rows and with two columns, this more general 
test coincides with the chi-squared test for no linear trend. 


However, SPSS does not calculate dose-specific odds ratios or confidence intervals. 
If these are required you must calculate them by hand. 


Computer Activity 5.1 Testing for trend in SPSS 


Dose-response data from a case-control study on the association between the 
number of cigarettes smoked and lung cancer are shown in Table 5.1. 


Table 5.1 Smoking and lung cancer, by dose These data are reproduced from 


Table 8.1 of Book 1. 
Average number of er eee 


cigarettes per day Cases Controls 
50+ 32 13 
25-49 136 T1 
15-24 196 190 
5-14 250 293 
0-4 35 82 
Total 649 649 


These data are in the SPSS data file smoking2.sav. Open this file now. 
Look at the Data View panel of the Data Editor, and note how the data have 


been entered. There are three variables: count, dose and casecon. The variable 
dose gives the average number of cigarettes smoked. The variable casecon is 
coded 1 for cases and 2 for controls. 
Now analyse these data in SPSS, as follows. 
© Obtain the Crosstabs dialogue box. 

Enter dose in the Row(s) field and casecon in the Column(s) field. 


Ko 
© Click on Format... to open the Crosstabs: Table Format dialogue box. 
° 


Select Descending, and click on Continue. (The rows will be re-ordered to 
reproduce the layout of Table 5.1. The ordering of the rows does not affect 
the results of the test.) 


© 


Click on Statistics... to open the Crosstabs: Statistics dialogue box. 


© 


Check the Chi-square check box. Leave all other boxes unchecked. 
© Click on Continue, then click on OK. 


2D 


Computer Book 1 


The following crosstabulation will be displayed in the Viewer window. 


dose * casecon Crosstabulation 


| casecon — ěűě | 
Cases: Died 
of lung cancer — —— 


Count 


w n 
ono 
moo 


No ms 
om 


Total 





This crosstabulation contains the data from Table 5.1, with row totals added. 
Below this table you will see the following output in the Viewer window. 


Chi-Square Tests 
Asymp. Sig. 


Pearson Chi-Square 50. .000 
Likelihood Ratio 51. .000 


Linear-by-Linear 
Association 
N of Valid Cases 


.000 





a. 0 cells (.0%) have expected countless than 5. The 
minimum expected countis 22.50. 


Look at the row labelled Linear-by-Linear Association. This gives details of 
the chi-squared test for no linear trend. The value of the test statistic is 46.219. 
The p value is reported as 0.000, so p < 0.0005. This provides strong evidence for 
a linear trend in the dose-response relationship between smoking and lung cancer. 


SPSS does not provide any other output for the analysis of dose-response 

relationships. So, in particular, if you wish to find out whether the linear trend in 

the dose-response relationship is increasing or decreasing, you must calculate the The analysis of dose-response 
dose-specific odds ratios by hand, using the entries in the dose*casecon data is discussed in 
Crosstabulation table (or the original data in Table 5.1). Subsection 2 Or Baek 2: 


Computer Activity 5.2 Post-traumatic stress disorder in US veterans 


Data from a cohort study of post-traumatic stress disorder (PTSD) among US These data are described in 
veterans are in the SPSS file gulfdose.sav. The stress levels are ranked in six Activity 8.1 of Book 1. 
categories from ‘Minimal’ to ‘Extreme’, and the data suggest an increasing 

dose-response relationship between stress levels and PTSD. 


(a) Obtain a crosstabulation of the data, with the rows arranged in order of 
decreasing stress level. 


(b) Test the null hypothesis of no linear dose-response relationship between risk 
of PTSD and level of stress experienced, using the chi-squared test for no 
linear trend. Obtain the value of the test statistic. What do you conclude? 


Summary of Chapter 5 


In this chapter you have learned how to use SPSS to test for no linear trend in 
dose-response relationships for cohort studies and case-control studies. 


26 


Computer Exercises on Book 1 


Computer Exercise 1 Invasive breast cancer and HRT 


In Example 2.2 of Book 1, data were presented from a large cohort study of 
hormone replacement therapy (HRT) and invasive breast cancer. The data relate 
to combined oestrogen-progestagen HRT. The group of women using 
oestrogen-only HRT is also of interest. The data for this group and for women 
who never used HRT are shown in Table E.1. 


Table E.1 Invasive breast cancer and use of oestrogen-only HRT 


Invasive breast cancer 


Exposure category Yes No Total 
Currently using oestrogen-only HRT 991 114392 115 383 


Never used HRT 2894 389 863 392 757 


Enter these data in SPSS. Use the variable names count, exposure and disease. 
For exposure, use the labels Currently using oestrogen HRT and Never used 
any HRT. For disease, use the labels Invasive breast cancer and No 
invasive breast cancer. Obtain a crosstabulation of the data. 


Computer Exercise 2 Analysis of the breast cancer and HRT data 


The data on oestrogen-only HRT in Table E.1 are in the SPSS data file 
oesthrt.sav. 


(a) Perform a chi-squared test for no association between use of oestrogen-only 
HRT and invasive breast cancer. Obtain the relative risk for invasive breast 
cancer, and a 95% confidence interval. 


(b) Summarize your results. 


Computer Exercise 3 Water fluoridation and tooth decay 


The SPSS data file caries.sav contains data on water fluoridation and caries You entered these data in SPSS 
(tooth decay) in children aged 8-12 years, that are described in Activity 7.2 of and obtained a crosstabulation 
Book 1. in Computer Activity 3.3. 


(a) Obtain the odds ratios and confidence intervals for the association between 
water fluoridation and dental caries within each age group. 


(b) Use Tarone’s test for homogeneity to test the null hypothesis that the 
underlying odds ratios in the four strata are equal. What do you conclude? 


(c) Use the Mantel-Haenszel test to test the null hypothesis that there is no 
association between water fluoridation and dental caries. 


(d) Obtain the Mantel-Haenszel estimate of the common odds ratio, and a 95% 
confidence interval, for the association between consumption of fluoridated 
water and dental caries. 


(e) Summarize your findings. 


Pall 


Computer Book 1 


Computer Exercise 4 Parental smoking and sudden infant deaths 


The SPSS data file smokesids.sav contains data from a case-control study to Blair, P.S., Fleming, P.J., 
investigate the association between parental smoking and sudden infant death Bensley, D. et al. (1996) _ 
syndrome (SIDS) in babies aged under one year. The exposure is parental Smoking and the sudden infant 
ki lassified in three groups: neither parent smokes (the reference death Syndrome Tegults mom 
SMOKE, © STOUDS: p 1993-5 case-control study for 
category), one parent smokes, two parents smoke. confidential inquiry into 


3 PAE : : : stillbirths and deaths in infancy. 
The question of interest is whether there is a dose-response relationship between British Medical Journal, 313, 


the number of parents who smoke and the risk of SIDS. 195-198. 


(a) Obtain a crosstabulation of the data, with the rows arranged in order of 
decreasing exposure. 


(b) Calculate the dose-specific odds ratios relative to the reference category. 
(Recall that SPSS does not calculate these odds ratios.) 


(c) Test the hypothesis that there is no linear dose-response trend. What do you 
conclude? 


28 


Learning outcomes 


You have been working to acquire the following skills in using SPSS. 


© 
© 


Enter data in tabular form in SPSS and display them in a table. 


Estimate relative risks (in cohort studies) and odds ratios (in cohort studies 
and case-control studies). 


Calculate expected frequencies for the cells of a contingency table under the 
null hypothesis of no association. 


Perform the chi-squared test for no association and Fisher’s exact test for 
2 x 2 tables and for larger tables. 


Enter stratified data in SPSS and display them in a table. 


Test for homogeneity of the odds ratio using Tarone’s test (for stratified 
data). 


Obtain the Mantel-Haenszel estimate of the common odds ratio and a 95% 
confidence interval for the odds ratio. 


Test the null hypothesis of no association for stratified data using the 
Mantel-Haenszel test. 


Perform McNemar’s test for no association in 1—1 matched case-control 
studies. 


Test for the presence of a dose-response relationship using the chi-squared 
test for no linear trend. 


29 


Solutions to Computer Activities 


Solution 1.3 
Entering data is described in Computer Activity 1.1. The data format required by Table S.1 


SPSS is as shown in Table 8.1. Count Exposure Disease 
647 1 1 
The value labels for the variable exposure are as follows: Smoked for the value 1, 622 1 9 
Never smoked for the value 2. The value labels for disease are Cases of lung 2 2 1 
cancer for the value 1, Controls for the value 2. Remember to weight the cases 27 2 2 


as described in Computer Activity 1.1. 
The data file smoking.sav 
Obtaining a crosstabulation is described in Computer Activity 1.2. The contains these data. 


crosstabulation is as shown below. 


exposure * disease Crosstabulation 


Count 


disease 
Cases of lung 
cancer Controls Total 


exposure Smoked ) 1269 
Never smoked 2 29 
Total 1298 





Solution 3.3 


(a) The method is described in Computer Activity 3.1. The data format required 
by SPSS is as shown in Table 8.2. 


Table S.2 


Count Exposure Outcome Stratum 


5 
25 
8 
23 
0 
17 
17 
33 
5 
13 
24 
14 
5 
16 
29 
25 


ee 
he 


BOwWrRrR MP WMWrFRrR WNWRFKRF WY WK 
NrFPNFPNFPNFPNHFPNHFPNH YH 
ARR RWwWWWNNNNHEHE 


In this table, the exposure variable (in this case, water type) has been 
coded 1 for Fluoridated and 2 for Not fluoridated. The outcome has been 
coded 1 for With caries and 2 for Without caries. The stratifying variable is 
coded as specified in the question. 


30 


Solutions to Computer Activities 


(b) 


(c) 
(a) 


Enter the data in the Data View panel of the Data Editor. Then click on 
the Variable View tab and enter the variable names and the value labels for 
the variables exposure, outcome and stratum. Also change the number of 
decimal places to 0. Define the variable types in the column headed Measure: 
choose Scale for count, Nominal for exposure and outcome, and Ordinal 
for stratum. Then weight the cases using Weight Cases. 


The data may be saved using the method described in Computer Activity 1.1. 


Obtaining a crosstabulation is described in Computer Activity 3.2. You 
should obtain the following table. 


exposure * outcome * stratum Crosstabulation 


Count 


outcome 
stratum With caries | Without caries T 


Age 8 years exposure Fluoridated 
Not fluoridated 


Age 9 years exposure Fluoridated 
Not fluoridated 


nn 
O 


Age 10 years exposure Fluoridated 
Not fluoridated 


25 
23 
17 
33 


Age 11-12 years exposure Fluoridated 
Not fluoridated 





exposure Fluoridated 
Not fluoridated 





Tota 





As in Computer Activity 2.4, SPSS displays a warning about the zero cell in 
the table. 


Solution 4.2 


The output required for this activity may be obtained as follows. 


© 
© 


Obtain the Crosstabs dialogue box. 


Enter exposure in the Row(s) field, casecon in the Column (s) field and 
stratum in the Layer 1 of 1 field. 


Click on Statistics... to open the Crosstabs: Statistics dialogue box. 


Check the Risk box and the Cochran’s and Mantel-Haenszel statistics 
box. 


Click on Continue, then click on OK. 


The stratum-specific odds ratios and confidence intervals are in the Risk 
Estimate output table. For married drivers, the odds ratio is 16.480, with 
95% confidence interval (3.354, 80.970). For unmarried drivers, the odds ratio 
is 28.667, with 95% confidence interval (5.857, 140.316). The unadjusted odds 
ratio is 25.550. 


The results of Tarone’s test are in the Tests of Homogeneity of the Odds 
Ratio table. The value of the test statistic is 0.236. Since there are two 
strata, there is 1 degree of freedom. The p value is 0.627. This indicates that 
there is little evidence that the odds ratio is different for married and 
unmarried drivers. 


The method and the output are 
described in Computer 
Activity 4.1. 


Sil 


Computer Book 1 


(c) 


(a) 


The Mantel-Haenszel Common Odds Ratio Estimate table contains the 
Mantel-Haenszel estimate of the common odds ratio. This is 23.001, with 
95% confidence interval (7.465, 70.866). 


The results of the Mantel-Haenszel test are in the Tests of Conditional 
Independence table. The value of the test statistic is 36.604, and there is 

1 degree of freedom. The p value for the test is quoted as 0.000, so 

p < 0.0005. Hence there is strong evidence that the underlying odds ratio is 
different from 1. Since the estimate (23.001) is greater than 1, you can 
conclude that there is strong evidence of a positive association between blood 
alcohol level and dying in a car crash. 


There is strong evidence of an association between alcohol levels in the blood 
and dying in a car accident (Mantel-Haenszel test, p < 0.0005). The odds 
ratio, adjusted for marital status, is 23.0 with 95% confidence interval 

(7.47, 70.9), suggesting a strong positive association between levels of alcohol 
in the blood and dying in a car accident. The stratum-specific odds ratios are 
much greater than 1 in both married and unmarried drivers. There is little 
evidence of an interaction with marital status (Tarone’s test, p = 0.63). 


Solution 4.4 


(a) 
(b) 


The Mantel—Haenszel odds ratio is ORwn = Š ~ 2.67. 


Look at the data in the Data Editor to check that the correct data entry 
format has been used. A further check is to verify that the tabulation in the 
cases*controls Crosstabulation output table is the same as that in 
Table 4.3. 


The p value for McNemar’s test is in the Chi-Square Tests output table. 
The p value for the test is 0.227. Thus there is little evidence of an association 
between educational level and risk of Alzheimer’s disease. However, note that 
there were only 11 discrepant pairs (3 + 8). Thus it is quite possible that this 
study was too small to detect even a moderately strong association. 


Solution 5.2 


(a) 


(b) 


The method is described in Computer Activity 5.1. The data corresponding 
to the Extreme level of dose should be given first in the dosexoutcome 
Crosstabulation output table. 


The value of the test statistic for the chi-squared test for no linear trend is in 
the row labelled Linear-by-Linear Association in the Chi-Square Tests 
output table. It is 455.153. The p value is reported as 0.000, so p < 0.0005. 
Thus there is strong evidence for a linear dose-response relationship between 
risk of PTSD and level of stress experienced. 


32 


The method is described in 
Computer Activity 4.3. 


Solutions to Computer Exercises 


Solution 1 


This computer exercise covers some of the ideas and techniques discussed in 
Chapter 1. 


Data entry in SPSS is described in Computer Activity 1.1. The format for data 
entry in SPSS is as shown in Table S.3. 


Table S53 
Count Exposure Disease 
991 1 1 
114392 1 2 

2894 2 1 
389 863 2 2 


The value labels for the variable exposure are Currently using oestrogen HRT 

for 1 and Never used any HRT for 2. The value labels for disease are Invasive 

breast cancer for 1 and No invasive breast cancer for 2. Weight the cases 

as described in Computer Activity 1.1. Obtaining a crosstabulation is described Note that the data file 


in Computer Activity 1.2. The crosstabulation is as shown below. ere contains these 
ata. 


exposure * disease Crosstabulation 


Invasive No invasive 
breast cancer breast cancer 


991 114392 115383 


Count 


exposure Currently using oestrogen 


HRT 
Never used any HRT 389863 392757 
504255 508140 





Solution 2 


This computer exercise covers some of the ideas and techniques discussed in 
Sections 2 and 4 of Book 1 and in Chapter 2 of this computer book. 


(a) Use the methods described in Computer Activity 2.1 and Computer 
Activity 2.3. Obtain the Crosstabs: Statistics dialogue box, and check 
both the Risk check box and the Chi-square check box. The output will 
contain tables labelled Chi-Square Tests and Risk Estimate. 


The details of the test for no association are in the row labelled Pearson 
Chi-Square of the Chi-Square Tests table. The test statistic is 17.506. The 
test is valid as there are no cells with expected frequency less than 5. There 
is 1 degree of freedom and the p value is reported as 0.000, so p < 0.0005. 


The Risk Estimate table shows the relative risk estimates. Since this is a 
cohort study, relative risks are appropriate summaries of association. The 
relative risk for invasive breast cancer is 1.166, with 95% confidence interval 
(1.085, 1.252). 


(b) There is strong evidence of an association between current use of 
oestrogen-only HRT and invasive breast cancer (chi-squared test, p < 0.0005). 
The relative risk for invasive breast cancer is 1.17, with 95% confidence 
interval (1.09, 1.25). 


B6 


Computer Book 1 


Solution 3 


This exercise covers some of the ideas and techniques discussed in Section 7 of 
Book 1 and in Chapter 4 of this computer book. 


To obtain the results required for parts (a) to (d), use the method described in 
Computer Activity 4.1, and the summary in the paragraph following the activity. 


(a) The stratum-specific odds ratios are in the Risk Estimate output table. The 


following results were extracted from this table. For eight-year olds, the 


estimated odds ratio is 0.575, with 95% confidence interval (0.164, 2.012). For 


nine-year olds, SPSS has not calculated the odds ratio because of the zero 
cell; the estimate is 0. For ten-year olds, OR = 0.224, with 95% confidence 
interval (0.066, 0.763). For children aged 11-12 years, OR= 0.269, with 95% 
confidence interval (0.086, 0.840). 


(b) Details of Tarone’s test for homogeneity are given in the Tests of 
Homogeneity of the Odds Ratio output table. The value of the test 
statistic is 3.960, on 3 degrees of freedom. The p value is 0.266. So there is 
little evidence that the odds ratios differ between the four age groups. 


(c) The results of the Mantel-Haenszel test of the null hypothesis of no 
association are in the Tests of Conditional Independence table. The 
value of the test statistic is 16.528 on 1 degree of freedom, and the p value is 
less than 0.0005. 


(d) The results required are in the Mantel-Haenszel Common Odds Ratio 
Estimate table. The Mantel-Haenszel estimate of the odds ratio, adjusted 
for the effect of age, is 0.248, with 95% confidence interval (0.127, 0.484). 


(e) Consumption of fluoridated water is negatively associated with dental caries 


in children aged 8-12 years. There is little evidence that the odds ratios differ 


between age groups (Tarone’s test, p = 0.27). There is strong evidence of an 
association between dental caries and consumption of fluoridated water 
(Mantel-Haenszel test, p < 0.0005). The Mantel-Haenszel odds ratio, 
adjusted for age, is 0.25 with 95% confidence interval (0.13, 0.48). 


Solution 4 


This exercise covers some of the ideas and techniques discussed in Subsection 8.2 
of Book 1 and in Chapter 5 of this computer book. 


(a) The crosstabulation required is as shown below. 


dose * casecon Crosstabulation 


casecon 
Case: Died of 
SIDS Control Total 


Count 





dose Both parents smokers 
One parent smoker 
No parent smoker 





(b) The estimated odds ratio for ‘Both parents smokers’ versus ‘No parent 
smoker’ is 
= 44 x 421 


OR; = Z x 7.29. 
a TTS 


The estimated odds ratio for ‘One parent smoker’ versus ‘No parent smoker’ is 


ae 
OF = 579 x33 1 


34 


This table may be obtained 
using the method described in 
Computer Activity 5.1. (Select 
Descending in the Crosstabs: 
Table Format dialogue box.) 


Solutions to Computer Exercises 


(c) The details of the test for no linear trend are given in the Linear-by-Linear The test for no linear trend in 
Association row of the Chi-Square Tests output table. The test statistic SPSS is described in Computer 
for the chi-squared test for no linear trend is x? = 75.059. The p value for the tivity 5.1. 
test is less than 0.0005. Thus there is strong evidence of a linear trend in the 
dose-response relationship between the number of parents who smoke and 
SIDS. From part (b), the trend is increasing. 


25 


Index 


1-1 matched case-control studies 19, 22 


chi-squared test for no association 11 
chi-squared test for no linear trend 26 
confidence interval 10, 20, 21 
Crosstabs 8 

Crosstabs: Cell Display 11 
Crosstabs: Statistics 10, 12, 22 
Crosstabs: Table Format 25 
crosstabulation 8 


displaying stratified tables 17 
displaying tabular data 8 


entering stratified tabular data 16 
entering tabular data 5 

Exact Tests 15 

expected frequencies 11 


Fisher’s exact test 13 


36 


Mantel-Haenszel chi-squared test 
Mantel-Haenszel odds ratio 19, 21 
McNemar’s test 22 


nominal variables 7 


odds ratio 9 
ordinal variables 7 


relative risk 9 


saving a data file 8 
scale variables 7 
stratified analysis 20, 22 


Tarone’s test 19, 20 
Value Labels 6 


Weight Cases 7, 17 


19, 21 


